IN THE CLAIMS: 


1. (currently amended) A video conferencing system comprising: 
a stationary image pickup device, remaining motionless 

during operation, for generating image signals representative of 
an image; \ 

an audio pickup device for generating audio signals 
representative of \sound from an audio source; and 

a multimodal integration architecture system adapted for 
processing said image signals and said audio signals to determine 
a direction of the audio source relative to a reference point, 
the system being ada^ed to determine the direction depending at 
least at times on the \image signals fj 

2. (original) The video\conf erencing system of claim 1 wherein 
said multimodal integration architecture system further 
comprises: \ 

an audio source localization system; 

a computer vision person detection system; and 

a multimodal speaker detection system. 

3. (original) The video conferencing system of claim 2, further 
comprising an integrated housing for an integrated video 
conferencing system incorporating the image pickup device, the 
audio pickup device, and the multimodal integration architecture 
system. 


4. (original) The video conferencing system of claim 3, wherein 
the integrated housing is sized fc)r being portable. 
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5. (currently amended) The video conferencing system of claim 5- 
1, further comprising an electronic pan tilt zoom system for 
electronically manipulating the image signals to effectively 
provide at least one of variable pan, tilt, and zoom functions. 

6. (currently amended) The video . conferencing system of claim & 
1, wherein the ^mage pickup device is a stationary camera. 

7. (currently amehded) The video conferencing system of claim & 

1, wherein the multimodal integrated architecture system provides 
control signals to Wke an electronic pan tilt zoom system. 

8. (currently amended) The video conferencing system of claim =h 

2, wherein the audio source movco relative to the reference 
point , the audio source localization system detects the movement 
of the audio source when the audio source moves relative to the 
reference point , and, iA response to the movement, the audio 
source localization system causes a change in the a_field of view 
of the image pickup device. 


9. (currently amended) The Video conferencing system of claim -5- 
1/ wherein the audio pickup \device is comprised of an array of 
two microphones . \ 

10. (currently amended) A method comprising the steps of: 

generating, at a stationary image pickup device, remaining 
motionless during operation, iVnage signals representative of an 
image ; \ 

generating, at an audio pibkup device, audio signals 
representative of sound from an\audio source; 
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processing the image signals and the audio signals to 
determine a direction of the audio source relative to a reference 
point , the detetrmi nation depending at least at times on the image 
signals ; \ 

manipulating the image signals to produce refined image 
signals depending^ on the determined direction ; and 

outputting said refined image signals. 

11. (original) The Method of claim 10 further comprising the 
steps of: \ 

applying said aiidio signals to an audio source localization 
system; \ 

applying said image signals to a computer vision person 
detection system; \ 

processing said audio signals and said image signals with a 
multimodal speaker detection system to determine the direction of 
the audio source ; \ 

generating control signals based on the determined direction 
of the audio source , the determination depending at least at 
times on the image signals ; \ 

applying the control sighals to an electronic pan tilt zoom 
system to mimic the effect of \at least one function of a movable 
camera, said function selected\ f rom the group consisting panning, 
tilting, and zooming said movaole camera; and 

providing an output from skid electronic pan tilt zoom 
system. \ 

12. (original) The method of claim 10, wherein manipulating the 
image signals includes further cotaprioincr electronically varying 
a field of view of the image pickdp device in response to the 
control signals. \ 
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13. (original)v The method of claim 10, wherein processing the 
audio signals mcludes determining an audio based direction of 
the audio sourceV based on the audio signals. 

14. (currently ameVided) The method of claim i2_10, wherein £fee 
audio oourcc movco Vrclativc to a reference point, and wherein 
processing the audi<3> signals further includes-^- detecting the 
movement of the audi© source when the audio source moves ; and 

manipulating the\ image signals includes causing 
electronically, in response to the movement, an incrcaoc a 
variation in £ke a field of view of the image pickup device. 

15. (original) The method of claim ±3- 13, wherein processing the 
image signals includes generating further comprioing the otcp of 
□upplying control signals V baocd depending on the audio based 
direction, and manipulating the image includes electronically 
panning, tilting, and/ or zooming said image pickup device 
depending on the control signals . 

16. (currently amended) A video .conferencing system comprising: 
two microphones for generating audio signals representative 

of sound from a speaker; \ 

a stationary video camera, remaining motionless during 
operation, for generating video signals representative of a video 
image ; \ 

an electronic pan tilt zoom system for manipulating video 
images to produce the visual effects\of panning, tilting, and/or 
zooming; \ 

a processor for processing the vrdeo signals and the audio 
signals to determine a direction of a speaker relative to a 

S:\be\ph07bea0.ncr.doc 5 \ 


reference point and supplying control signals to the electronic 
pan tilt zodm system for producing images that include the 
speaker in th^ field of view of the camera, the determination of 
direction depending at least at times on the video signals, the 
control signals\being generated based on the determined direction 
of the speaker; end 

a transmitter for transmitting audio and video signals for 
video conferencing. 

17. (new) The video\conf erencing system of claim 1, wherein at 
times the determination of the direction of the audio source 
depends on both the image signals and the audio signals. 

18. (new) The video corff erencing system of claim 1, wherein the 
processing includes determining the movement of the audio source 
depending at least at times on the image signals. 

19. (new) The video conferencing system of claim 1, wherein the 
processing includes tracking the position of the audio source 
when the audio source moves A the tracking depending at least at 
times on the image signals. \ 

20. (new) The video conferencing system of claim 2, wherein the 
computer vision person detection system detects the movement of 
the audio source when the audi© source moves relative to the 
reference point, and, in response to the movement, the computer 
vision person detection system causes a change in a field of view 
of the image pickup device. \ 


21. (new) The method of claim 10, \ wherein processing the image 
signals further includes: \ 
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detecting the movement of the audio source when the audio 

\ 

source moves; and 

causing electronically, in response to the movement, an 
variation in a field of view of the image pickup device. 


p<3 . (new) The method of claim 10, wherein the processing includes 
determining the movement of the audio source depending at least 
at times on the image\ signals . 


^4. (new) The method ot claim. 10, wherein the processing includes 
tracking the position of the audio source when the audio source 
moves, the tracking depending at least at times on the image 
signals. \ 


a stationary image pickup device, remaining motionless 
during operation, for generating image signals representative of 
an image; \ 

an audio pickup device for generating audio signals 
representative of sound from an audio source; 

means for processing the image signals and the audio signals 
to determine a direction of the audio source relative to a 
reference point, the determination depending at least at times on 
the image signals; \ 

means for manipulating tWe image signals to produce refined 
image signals depending on the \determined direction; and 

an output for outputting said refined image signals. 



p. 



(new) A video conferencing system, comprising: 
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